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1 Introduction 

The  propagation  of  sound  waves,  defined  as  an  oscillatory  motion  with  small  amplitude  in  a 
compressible  fluid  [1],  can  be  described  by  the  linearized  Euler  equations  (lee),  under  the  as- 
sumptions that  there  is  no  feedback  to  the  mean-flow  and  that  effects  of  viscosity  and  heat 
conduction  can  be  neglected. 

In  the  field  of  Computational  Aeroacoustics  (CAA),  it  is  widely  recognized  that  the  numerical 
algorithms  applied  to  solve  the  governing  equations  must  have  sufficiently  low  numerical  disper- 
sion and  dissipation  in  order  to  accurately  simulate  the  propagation  of  aeroacoustic  information. 
To  this  end,  higher-order  schemes  can  be  used.  The  Discontinuous  Galerkin  (dg)  [2,  3]  method 
is  an  ultimately  compact  finite-element  method  which  can  be  applied  efficiently  on  unstructured 
meshes,  thus  allowing  geometrically  complex  problems  to  be  handled.  Higher-order  (higher 
than  second-order)  accuracy  can  be  obtained  relatively  easily  because  of  the  compactness  of 
the  method,  at  the  penalty,  however,  of  an  increasing  number  of  unknowns  per  element.  An 
extensive  description  of  the  application  of  the  Discontinuous  Galerkin  method  in  the  field  of 
CAA  is  given  by  Atkins  et  ai.  [4,  5,  6,  7]. 

The  results  presented  in  the  present  paper  are  obtained  with  the  computer  code  DIGS3D, 
which  is  based  on  a numerical  algorithm  developed  to  solve  the  lee  in  three  dimensions.  For 
the  spatial  discretization  of  the  lee  the  Quadrature-free  Discontinuous  Galerkin  method  has 
been  applied,  while  the  time  integration  is  performed  by  a four-step,  low-storage  Runge-Kutta 
algorithm.  At  present  the  algorithm  is  second-order  accurate  in  both  space  and  time.  The 
numerical  algorithm  has  already  been  applied  to  a three-dimensional  broadband  cavity-noise 
prediction  problem  [8],  where  it  is  part  of  a three-step  method.  In  the  three-step  method,  the 
first  step  provides  the  time-averaged  RANS-solution  from  which,  in  the  second  step,  turbulent 
aeroacoustic  source  terms  for  the  lee  are  obtained.  In  the  third  step  digs3d  is  used  to  simulate 
the  propagation  of  the  aeroacoustic  disturbances  produced  by  the  source  field. 

The  work  presented  in  this  paper  can  be  regarded  as  a continuation  of  the  work  presented 
in  [9].  In  [9]  two  verification  problems,  the  convection  of  a 2D  compact  acoustic  disturbance 
and  the  radiation  from  a three-dimensional  harmonic  monopole  source,  were  presented.  One  of 
the  conclusions  drawn  in  that  paper  is  that  acceleration  of  the  algorithm  to  relax  requirements 
on  computational  effort  was  needed.  Here  we  present  results  obtained  for  the  first  of  the  two 
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aforementioned  verification  problems,  obtained  with  a parallelized  version  of  -the  code.  The 
algorithm  is  parallelized  employing  MPI  (Message  Passing  Interface)  and  the  code  is  run  on  a 
1024-CPU  platform  (teras  [10]).  A more  thorough  description  of  parallelizing  a DG-based  CAA- 
algorithm,  employing  mpi,  is  presented  in  [11].  The  main  objectives  of  the  work  presented  in  the 
present  paper  are  (continuation  of)  the  verification  of  the  numerical  algorithm  and  conducting 
a performance  test  of  the  parallelized  algorithm  on  teras. 

The  outline  of  the  paper  is  as  follows:  In  the  first  section  the  governing  equations,  i.e.  the 
Linearized  Euler  equations  (lee),  are  presented.  In  the  second  section  the  Quadrature-free  Dis- 
continuous Galerkin  method  is  briefly  described.  The  next  section  describes  the  convection  of  a 
two-dimensional  compact  acoustic  disturbance  in  a uniform  mean  flow.  The  analytical  solution 
of  the  problem  is  presented  and  the  obtained  numerical  results  are  compared  with  the  analytical 
solution.  Furthermore  results  of  a performance  test  on  teras  are  presented.  Subsequently, 
concluding  remarks  and  suggestions  for  further  research  are  given. 


2 Linearized  Euler  Equations 


Consider  the  dimensionless  linearized  Euler  equations  (lee)  in  conservation  form  in  three  spatial 
dimensions 


l<u>4u+^f'<u>=s’ 


U(x,  t),  (x,  t)  G Cl  x (0,  T ), 


(1) 


where 

Fy(U)  = Ay(U0)U,  Ay  G JR5  x J?5,  (2) 

with  initial  and  boundary  conditions.  Here  S (G  JR5)  is  the  source  term  for  the  lee,  ft  G H?  is 
an  open  domain  with  boundary  dft  and  t € (0 ,T)  denotes  time.  The  summation  convention  is 
used  on  the  repeated  index  j,  where  j = 1,2,3.  U = where  the. components 

of  the  vector  denote  the  dimensionless  acroacoustic  density  perturbation,  the  three  velocity 
perturbation  components  and  the  pressure  perturbation,  respectively.  The  components  of  vector 
Uo  denote  the  dimensionless  quantities  related  to  the  mean  flow  density,  the  three  components 
of  the  mean  flow  velocity  and  the  mean  flow  pressure.  The  matrices  Ay  are  defined  as: 


Ay(U0)  = 


[My 

5ji 

0 

0 

Mj 

0 

0 

6ij 

0 

0 

My 

0 

<% 

0 

0 

0 

My 

hj 

d 

% 

*J'3 

Mj 

(3) 


where  Mi,  M2, M3  are  the  components  of  the  mean  flow  Mach-number  in  x,  y and  z-direction, 
respectively,  and  £,y  denotes  the  Kronecker-delta  symbol. 


3 Quadrature-free  Discontinuous  Galerkin  Method 

In  this  section  we  briefly  describe  the  Quadrature-free  Discontinuous  Galerkin  spatial  discretiza- 
tion. In  [8]  a more  detailed  description  is  presented.  Throughout  this  section  we  have  i G [1,  JVe], 
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k € [0,M]  and  l € [0,Af].  - 

Let  us  define  the  solution  space  V defined  on  f 2,  let  U € V and  let  the  inner  product  on  V 
be  defined  as: 

(u,v)  = / u(x)v(x)dx,  (4) 

from  which  we  can  define  [12]  the  L2-norm  ||u||  = y/(u,u). 

The  domain  0 is  partitioned  into  non-overlapping  elements  f 2 = Uf2i.  Let  Vj,  C F be 
a finite-dimensional  subspace,  spanned  by  the  linearly  independent  basis  functions  6jfc,  and  let 
U/,  € Vh.  is  obtained  by  the  projection  of  U onto  the  subspace  = span{bik}: 

Nc  M 

u/.  = EIi  Vifc(t)6fc(x).  (5) 

i=l  0 

In  the  Discontinuous  Galerkin  formulation  the  unknown  coefficients  are  obtained  by  solving 
the  system  of  equations  given  by: 

(L(Ufc),itt>  = <S,te>,  V(»,  0.  (6) 

Partial  integration  of  Eq. (6)  yields: 

(bu, ~ (|^7 >A-ibik}  vifc  + biibik{A.jnj)vikdr  - (bu, 3),  V(i, k, l),  (7) 

where  = dtli  and  n is  the  unit  outward  normal  to  boundary  r*. 

Closer  inspection  of  both  Eq.(6)  and  Eq.(7)  reveals  that  the  solution  within  an  element  only 
depends  on  information  within  that  element,  i.e.  there  is  no  ’’communication”  between  elements. 
Furthermore  the  global  solution  is,  in  general,  discontinuous  over  an  element  interface.  To 
provide  the  crucial  coupling  and  to  handle  the  discontinuity  at  element  interfaces,  the  boundary- 
normal  flux,  F^U/Yri-j  = AjrijVh,  is  modeled  by  means  of  an  approximate  Riemann  flux 
F j*  ( Ux, , Ufl)nj  ,•  where  U £ and  Ur  represent  on  either  side  of  the  element  interface.  For  the 
present  implementation  of  the  method  we  approximate  the  Riemann  flux  by  the  Lax-Friedrichs 
flux,  which  can  be  written  as: 

Ff  (U^U*)^-  = ^{[F^Ux)  + F^UiOK-  - «(Ufi  - Ux)},  (8) 

where  n points  from  element  L to  element  R and  a is  a positive  quantity  that  is  larger  in 
magnitude  than  the  eigenvalues  of  the  Jacobian  of  |[Fy(U&)  + Fy(Ufl)]rij-. 

For  a quadrature-free  implementation  [4]  of  Eq.(7)  the  source- term  in  the  lee  (also)  has  to  be 
projected  onto  V^: 

Ne  M 

$h  = £ 2 s ik{t)bik{x).  (9) 

i=l  k= 0 

Since  the  lee  are  linear,  Fy(U/v)  is  expanded  in  a natural  way  as  can  be  seen  from  Eq.(7). 
Furthermore,  Atkins  and  Lockard  [5]  report  that  for  the  simulation  of  the  scattering  of  acoustic 
waves  (where  an  assumption  of  linearity  can  be  made)  it  is  sufficient  to  represent  the  mean  flow 
by  a lower-order  polynomial  to  ensure  the  formal  order  properties  of  the  method. 

Johnson  and  Pitkarata  [2]  prove  that  when  the  basis  functions  are  polynomials  of  degree  p,  the 
order  of  accuracy  is  at  least  p + In  most  practical  cases  [4]  the  order  of  accuracy  is  observed 


(SYA)  24-4 


to  be  p + 1.  With  the  relation  between  M,  p and  the  number  of  space-time  dimension  d given 
by  [4]: 

1 d 

M{p,d)  = -l  + ~Y[{p  + k),  (10) 

a'  k= l 


this  implies  that  in  order  to  obtain  second-order  accuracy  we  must  choose  p = 1,  resulting  in 
M = 3 for  d = 3. 

Eq.(7)  is  conveniently  evaluated  through  the  introduction  of  computational  coordinates.  These 
coordinates  are  local  to  a reference  element  for  which  we  choose  an  equal-sided  tetrahedron. 
Within  the  reference  element  we  define  the  basis  set,  consisting  of  monomials,  as  bk  € {1,  £,  rj,  £}. 
The  time  integration  is  performed  by  a second-order  accurate,  four-step,  low-storage  Runge- 
Kutta  algorithm  [13]. 

The- Discontinuous  Galerkin  method,  due  to  its  local  character,  is  very  well  suited  for  parallel 
calculations.  The  present  parallel  implementation  is  based  on  a domain  decomposition  of  the 
unstructured  mesh  into  several  blocks  where  the  calculation  for  each  block  is  performed  on  a 
different  processor.  The  mpi  (Message  Passing  interface)  routines  are  used  to  communicate  data 
between  processors  for  the  flux  calculations  at  the  interfaces  of  elements  belonging  to  different 
partitions.  The  block  partitioning  itself,  is  based  on  the  metis  ([14])  libraries,  and  is  optimized  in 
order  to  achieve  both  a quasi-uniform  loading  on  the  processors,  and  to  lower  the  communication 
costs  during  computation  (minimization  of  the  number  of  interfaces). 


4 Convection  of  a 2D  Compact  Acoustic  Disturbance 

4.1  Problem  description 

In  this  verification  case  the  lee  (Eqs.(l)  to  (3))  are  solved  on  a square  domain  in  which  a compact 
acoustic  perturbation  is  imposed  through  the  initial  conditions.  The  two-dimensional  domain 
has  dimensions  x € [—100,100],  y € [—100,100]  and  the  acoustic  source  is  initially  centered 
at  x = y = 0.  The  mean  flow  is  uniform  with  Mach-number  components  M\  = M — 0.5 
and  M2  = 0,  there  are  no  sources  (S  = 0)  and  the  initial  condition  for  the  2D  solution  vector 
U(x,  y,  t)  = (//,  v!,v',p')T  is  given  by: 


U(z,y,0) 


( f{x>y)  N 

Pxfix.y) 
Pyf{x,y)  ’ 
V f{*>y) 


(11) 


with 

/(x,y) = e~a(xi+y2\  a = ^,  0 = 0.04.  (12) 

This  test  case  has  also  been  addressed  by  Atkins  and  Shu  [4].  Together  with  a vorticity  wave  it 
is  furthermore  described  as  part  of  the  iCASE/LaRC  Workshop  on  Benchmark  Problems  in  Com- 
putational Aeroacoustics  [15].  Note,  however,  that  in  the  benchmark-case  the  initial  velocities, 
v!  and  1/,  are  taken  equal  to  zero. 
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4.2  Analytical  solution 


The  lee,  together  with  the  conditions  described  above,  can  be  transformed  into  the  wave  equa- 
tion: 


D2q 

Dt2 


-V2<?  = 0, 


with 


D 

Dt 


d_ 

dt 


+ M 


d_ 
dx ’ 


(13) 


where  q is  either  one  of  the  primitive  variables  fJ,  u',  t/  or  p' . 

Upon  introducing  a coordinate  system  moving  with  the  mean  flow:  r = t,  £ = x — M t,  tj  = j/, 
the  wave  equation  can  be  written  as: 


£•  ov 

dr 2 * 


0, 


d ,7 


*-<*-*> 


(14) 


Next  we  introduce  polar  coordinates  (r,  6),  where  £ = r cos(0)  and  17  = r sin(0)  and  assume 


that  q is  independent  of  0: 


d2q  _ , IfM  0 

dr2  ydr2  r dr  J 


(15) 


Instead  of  solving  Eq.(15)  for  one  of  the  primitive  variables  we  will  solve  it  for  the  velocity 
potential  <j>(ryt),  which  also  satisfies  the  wave  equation.  With  u'  = ||  and  v'  = ^ the  primitive 
variables  are  related  to  the  velocity  potential  by 


dA=u> 

dr  r? 


(16) 


where  u'r  is  the  radial  velocity  component.  The  initial  conditions  can  be  obtained  from  these 
two  relations 

^,0)  = £u'r(r)dr  = -^-e-ar\  (17) 

^(r,0)  = -p'(r,0)  = -e-ar\  (18) 

The  solution  of  Eq.(15)  for  q = <j>  with  the  initial  conditions  given  by  Eq.(17)  and  Eq.(18)  can 
be  obtained  conveniently  by  employing  the  Hankel  transform: 


roo 


^(r.T)=/  AJo(Ar)$(A,r)dA, 

Jo 

poo 

$(A  ’T)  = y0  rJo(\r)<b(r,T)dr, 


(19) 

(20) 


where  Jo  is  the  zeroth-order  Bessel  function  of  the  first  kind.  Upon  applying  the  Hankel  trans- 
form the  problem  reduces  to  solving 


d2$ 

dr2 


+ A2$  = 0, 


(21) 


with 


$(A,0)  = £?(A), 


^(A,0)  = F(A), 


(22) 
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where  E( A)  and  F( A)  are  given  by:  — 

E(X)  = -^f\j0(Xr)c-ar2dr, 

f00  2 

F( A)  = — / r Jo(Ar)e  ar  dr. 

do 

Prom  Gradshteyn  and  Ryzhik  [16]  we  obtain 

rJo(Xr)e~ar'1  dr  — . 

JO  2a 

The  solution  of  Eq.(21)  can  be  written  as: 

$(A,t)  = E(X)  cos  (At)  + sin(Ar). 

A 

Using  Eq.(19)  to  transform  back  from  A to  r we  obtain  for  the  velocity  potential: 

(j>{r,r)  = — -7-  [ AJo(Ar)cos(Ar)e-^  dX+  [ Jo(Ar) sin(Ar)e_:^  dAl . 

2a  1 2a  do  do  J 

The  general  solution,  as  function  of  r and  t,  finally  becomes: 

p'(r,t ) = XJo{Xr)  cos(Xt)e~*Z  dX  - f°°  A2do(Ar)sin(At)e~^  dx\ , 

2a  {Jo  2a  Jo  ) 


(23) 


(24) 


(25) 


(26) 


(27) 


where 


u'(r,t)  = XJi(Xr)  sin(At)e  *«dX  + -^j°°X2Ji(Xr)cx>s(Xt)e  4*  dA  j , (28) 

v'(r,t)  ='  — - { f XJi(Xr)sm[Xt)e~*^  dX+  f X2J\(Xr)cos(Xt)e~*^  tul , (29) 

2ar  (Jo  loc  Jo  ) 

(30) 


r = \J{x-  Mt )2  + y2, 

and  //  = p'.  In  the  above  expressions  Ji  is  the  first-order  Bessel  function  of  the  first  kind 


4.3  Numerical  result 

For  a correct  implementation  of  the  initial  condition  we  have  to  project  U(ar,y,-0),  given  by 
Eq.(ll),  onto  the  basis  functions.  The  integrations  which  then  have  to  be  performed  are  not 
straightforward.  Alternatively,  we  will  approximate  the  initial  solution  by  a second-order  accu- 
rate Taylor-series  expansion  around  the  centroid  of  each  element  as  was  also  done  by  Atkins  and 
Shu  [4]. 

In  order  to  perform  this  2D  calculation  with  the  present  3D  method  all  derivatives  in  the 
z-direction  are  taken  equal  to  zero.  Furthermore  symmetry-plane  boundary  conditions  are  used 
for  the  upper  and  lower  boundary  in  the  z-direction.  On  all  the  other  boundaries  of  the  computa- 
tional domain  characteristic-based  non-reflecting  boundary  conditions  sire  used.  Both  boundary 
conditions  are  described  in  [7].  We  take  for  the  2D  solution  the  result  in  the  plane  z=0. 

The  simulations  have  been  carried  out  on  different  tetrahedral  meshes.  The  physical  domain, 
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Cl,  is  partitioned  into  Ne  identical  tetrahedrons  obtained  by  dividing  Cl  into  equally  sized  cubes, 
which  provides  us  with  a background  mesh,  and  then  dividing  each  cube  into  12  identical  tetra- 
hedrons. In  table  1,  specifications  of  the  different  meshes  are  given.  The  background  mesh 
dimensions  are  given  by  Nx,  Ny  and  Nz,  while  np  denotes  the  number  of  processors  used  in  the 
computation,  h denotes  a characteristic  mesh-size,  and  is  given  by  h = All  simulations  have 
been  carried  out  with  a CFL-number  of  0.15. 


Case 

Nx 

*9 

Nz 

Tetrahedrons 

faces  z — 0 

h 

np 

II 

40 

40 

2 

38,400 

3,200 

l 

40 

2/4/8/16/32 

III 

80 

80 

4 

307,200 

12,800 

1 

80 

* 

IV 

100 

100 

5 

600,000 

20,000* 

r 

100 

2/4/8/16/32/64/128 

V 

120 

120 

6 

1,036,800 

28,800 

1 

120 

64 

VI 

160 

160 

8 

2,457,600 

51,200 

f 

ISO 

128 

* result  taken  from  reference  [9] 
t 2D  result  taken  in  z = 1-plane 


Table  1:  Mesh  specifications  and  number  of  processors  used  for  the  computations  for  the  different 
cases 

Verification  and  accuracy 

In  [9]  case  IV  was  already  considered  with  the  non-parallel  code.  The  results  obtained  with  the 
current  version  of  the  code  showes  no  differences  with  the  results  obtained  with  this  previous 
version  of  the  code. 

Fig.(l)  presents  the  results  obtained  for  p',  along  the  line  x = 10  for  t = 20,  for  the  cases  V and 
VI  as  well  as  the  analytical  solution.  The  location  of  the  disturbance  is  accurately  resolved  in 
both  cases.  The  magnification  of  the  region  -30  < r < —10,  presented  in  Fig.(l.b),  shows  that 
the  result  obtained  for  case  VI  shows  a slightly  better  comparison  with  the  analytical  solution, 
than  the  result  obtained  for  case  V,  as  one  might  have  expected. 

Fig.  (2)  presents  the  results  obtained  for  p'}  along  the  fine  x = 20  for  t = 40,  for  the  cases 
IV  and  VI  as  well  as  the  analytical  solution.  The  results  obtained  for  case  VI  show  a better 
agreement  with  the  analytical  solution  than  those  of  case  IV. 

In  Figures  (1)  and  (2)  the  graph  of  the  analytical  solution  is  obtained  by  approximating  the 
integrals  in  Eq.(27)  by  means  of  a composite  Simpson’s  rule.  The  number  of  quadrature  points 
is  chosen  sufficiently  high,  so  that  further  increasing  the  number  of  quadrature  points  will  not 
be  visible  in  the  figures. 

For  r = 0 Eq.(27)  can  be  evaluated  exactly,  without  any  numerical  approximation.  For  the 
pressure  perturbation  we  obtain  the  analytical  solution  |/(0,20)  = —0.016624864.  We  denote 
the  numerical  approximation  of  pr(0,20),  computed  on  a mesh  with  characteristic  size  h,  by  p'h. 
In  Fig.(3.a)  we  have  plotted  e = \p'h  — p/ (0, 20) | vs.  h~l  on  logaritmic  scales.  The  values  of  the 
characteristic  mesh  size  h related  to  the  various  cases  are  presented  in  table  (1).  The  results 
of  case  III,  V and  VI  almost  lie  on  a straight  line.  The  result  obtained  on  the  coarse  mesh  of 
case  II  only  deviates  a little  from  this  line.  (Note  that  the  result  obtained  for  case  IV  is  not 
taken  into  consideration,  since  the  result  is  not  measured  in  the  plane  z = 0.)  The  slope  of  the 
fine  gives  us  the  order  of  the  method  in  the  point  (r,t)  — (0,20).  The  slope  suggests  5t/l -order 
accuracy  in  the  specific  point  (r,  t)  = (0,20).  In  [17]  Hu  and  Atkins  present  results  of  a detailed 
study  of  spatially  propagating  waves  in  a DG  sheme  applied  to  a ID  system  of  linear  hyperbolic 
equations.  They  report  that  the  phase  error  (of  the  physical  mode)  decays  like  h2p+2,  where  p 
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(a)  (b) 

Figure  1:  Comparison  of  numerical  results,  obtained  for  case  V and  VI  for  x = 10  and  t = 20, 
with  the  analytical  solution  p'(r,  20).  b:  Magnification  of  the  region  -30  < r < —10. 


(a)  (b) 

Figure  2:  Comparison  of  numerical  results,  obtained  for  case  IV  and  VI  for  x = 20  and  t = 40, 
with  the  analytical  solution  p'(r,  40).  b:  Magnification  of  the  region  —50  < r < —30. 


is  the  degree  of  the  polynomials  which  axe  used  as  basis  functions.  They  also  report  that  the 
global  error  measure  (for  the  definition  see  [17])  reduces  at  order  2p  + 1.  We  use  basis  functions 
of  degree  p = 1 which  would  result  in  a decay  of  the  phase  error  like  h4  and  a decay  of  the 
global  error  measure  like  7i3,  assuming  that  the  results  obtained  by  Hu  and  Atkins  in  ID  would 
apply  to  3D.  However,  this  still  does  not  explain  why  we  observe  a decay  of  the  error  like 
Clearly,  further  research  on  this  topic  is  necessary. 

Assuming  that  our  method  is  5^-order  accurate  in  (r,i)  = (0,20),  we  can  apply  Richard- 
son extrapolation  ([18])  to  obtain  a prediction  of  the  exact  solution.  Employing  Richardson 
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(a) 


Figure  3:  Grid  convergence  study  for  the  pressure  perturbation  in  (r,t)  = (0, 20). 

extrapolation  we  assume  that  the  following  holds: 

tfh  = a + bh5.  (31) 

Using  the  numerical  results  of  cases  V and  VI  we  can  obtain  the  coefficients  a and  b.  Coefficient 
a gives  the  prediction  of  the  exact  solution  p'(0, 20),  we  obtain  a = —0.016616442.  The  relative 
error  | V&T  ^ approximately  0.05  %.  Fig.(3.b)  shows  the  polynomial  of  Eq.(31)  together 
with  the  numerical  results  of  the  different  cases. 

Speed-up 

A performance  test  has  been  carried  out  for  case  II  and  IV  on  teras,  which  is  a 1024-CPU 
platform  ([10])  consisting  of  two  512-CPU  SGI  Origin  3800  systems.  It  has  a peak  performance 
of  1 TFlops  (1012  floating  point  operations  per  second),  it  is  fitted  with  500Mhz  R14000  CPU’s 
organized  in  256  4-CPU  nodes  and  possesses  1 TByte  of  total  memory.  The- speed-up  is  measured 
in  terms  of  the  ratio  of  the  user  CPU-times,  where  the  two  processor-job  serves  as  reference.  From 
Fig.(4.a)  it  can  be  seen  that  near-linear  speed-up  is  obtained  for  case  IV.  Slightly  superlinear 
speed-up  is  obtained  on  4 processors  for  both  case  II  and  IV,  which  is  probably  caused  by  a 
more  efficient  cache  performance.  Fig.(4.a)  shows  furthermore  that  by  dividing  the  domain  of 
case  II  over  more  than  8 processors,  the  number  of  elements  assigned  to  each  processor  becomes 
too  small  and  the  communication  overhead  becomes  apparent.  FVom  Fig.(4.b)  we  observe  that 
for  case  IV  we  have  a near-linear  speed-up,  up  to  64  processors.  On  128  processors  the  relative 
speed-up  (relative  to  np  = 2)  has  dropped  to  60%. 

On  128  processors  approximately  8.109  floating  point  operations  were  performed  per  second 
(8  Gflops).  For  the  performance  test  for  case  IV,  the  computation  involved  1000  time  steps. 
The  computation  took  less  than  8 minutes  on  128  processors  (elapsed  time).  TERAS  has  a 
peak-performance  of  1 Gflops  per  processor,  for  case  IV  the  code  was  observed  to  operate,  on 
average,  at  10  % of  this  peak  when  up  to  8 processors  were  used.  Using  increasingly  more 
processors  (more  than  8)  resulted  in  a gradual  drop  in  performance  to  approximately  6%  of  the 
peak-performance,  when  128  processors  were  used,  which  is  thought  to  be  due  to  the  increased 
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(a) 


(b) 


Figure  4:  Speed-up  measured  for  case  II  (100  time  steps)  and  case  IV  (1000  time  steps)  on 
Origin  3800.  a : Result  for  case  II  and  IV  for  up  to  32  processors,  b:  Result  for  case  IV  for  up 
to  128  processors. 


communication  overhead. 


5 Concluding  Remarks 

The  results  presented  in  the  present  paper  are  obtained  with  the  computer  code  digs3d,  which 
is  based  on  a numerical  algorithm,  developed  to  solve  the  Linearized  Euler  equations  (lee) 
in  three  dimensions.  For  the  spatial  discretization  of  the  LEE  the  Quadrature-free  Discontinu- 
ous Galerkin  method  has  been  applied,  while  the  time  integration  is  performed  by  a four-step, 
low-storage  Runge-Kutta  algorithm.  The  algorithm  is  second-order  accurate  in  both  space  and 
time.  The  main  objectives  of  the  work  presented  in  the  present  paper  are  (continuation  of)  the 
verification  of  the  numerical  algorithm  and  conducting  a performance  test  of  the  parallelized 
algorithm  on  teras. 

As  a verification  problem,  the  convection  of  a 2D  compact  acoustic  disturbance  has  been 
considered  on  different  meshes  employing  different  numbers  of  processors  on  teras,  a 1024-CPU 
platform  consisting  of  two  512-CPU  SGI  Origin  3800  systems.  The  obtained  numerical  solutions 
show  very  good  agreement  with  the  analytical  solution.  A grid  convergence  study  for  the  pres- 
sure perturbation  in  the  centre  of  the  decayed  and  converted  2D  Gaussian  pulse  at  dimensionless 
time  t = 20,  suggests  that  the  numerical  result  in  that  point  is  5t/l-order  accurate  in  the  charac- 
teristic mesh  size.  The  reason  for  this  unexpected  high  accuracy,  obtained  in  that  specific  point 
with  the  second  order  accurate  algorithm,  is  unknown  at  present  and  needs  further  investigation. 

The  conducted  performance  test  showes,  that  for  a medium-sized  mesh  (600,000  elements), 
near-linear  speed-up  is  obtained  when  up  to  64  processors  are  used.  Using  128  processors  (for 
the  same  problem),  showes  that  a computational  speed  of  approximately  8.109  floating  point 
operations  per  second  (8  Gflops)  is  obtained. 
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In  addition  to  further  verification  and  validation,  future  activities  will  probably  include  opti- 
mization of  the  algorithm  and  extension  of  the  algorithm  towards  higher-order  accuracy. 
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Reference  # of  Paper:  24 
Discusser’s  Name:  Dr.  Bastiaan  Oskam 

Author’s  Name:  Mr.  Carl  P.  A.  Blom 

Question: 

How  great  is  the  computational  complexity  of  the  DG  method  in  comparison  with  the 
DRP  scheme  on  the  same  finite  element  mesh?  Is  it  larger  than  a factor  of  four? 

Answer: 

That  will  depend  on  the  actual  wave  propagation  properties  of  one  method  in  relation 
to  the  other.  I can’t  give  a number  at  this  stage  because  we  don’t  have  analytic  values  for  the 
dispersion  and  dissipation  errors  in  three  dimensions.  (I  also  don’t  know  if  these  numbers  are 
available  for  the  DRP  scheme  in  three  dimensions.)  However,  on  structured  meshes  the  DRP 
scheme  would  be  expected  to  be  more  efficient. 


Discusser’s  Name:  Prof.  Ir.  Joop  Slooff 

Author’s  Name:  Mr.  Carl  P.  A.  Blom 

Question: 

Apparently  you  do  not  know  precisely,  on  an  analytical  basis,  what  the  order  of 
accuracy  is  of  your  scheme.  You  had  to  do  a numerical  mesh  convergence  experiment  to  find 
out.  Then,  how  do  you  intend  to  improve  the  order  of  accuracy  of  your  method? 

Answer: 

We  did  a one  dimensional  wave  propagation  analysis  that  does  give  us  that  kind  of 
information.  The  one  dimensional  analysis  is  on  a structured  mesh.  We  do  not  know  if  we  can 
apply  the  results  obtained  in  one  dimension  to  three  dimensional  problems.  We  also  don’t 
know  if,  and  how,  it  applies  to  unstructured  meshes.  Dr  Hu  [Hu,  F.  Q.,  Hussaini,  M.  Y.,  and 
Raestarinera,  P.,  “An  analysis  of  the  discontinuous  Galerkin  method  for  wave  propagation 
problems,”  Journal  of  Computational  Physics,  141,  1998,  pp.  921-946]  has  presented  results  of 
his  one-dimensional  analysis  of  wave  propagation  and  showed  very  promising  results.  We  also 
did  this  grid  study  to  enable  us  to  compare  with  analytical  results  that  might  be  obtained  at  a 
later  stage. 


